perf(crypto): stream field-element bytes into hashers and transcript#773
Open
Oppen wants to merge 2 commits into
Open
perf(crypto): stream field-element bytes into hashers and transcript#773Oppen wants to merge 2 commits into
Oppen wants to merge 2 commits into
Conversation
Eliminates the per-element heap allocation in Merkle leaf hashing and Fiat-Shamir transcript appends. AsBytes gains a stream_bytes sink method, overridden zero-alloc for Goldilocks base and degree-3 extension; the Merkle backends and DefaultTranscript use it instead of as_bytes()/ to_bytes_be(). Adds hash_data_parts + verify_merkle_path_from_hash so row-pair openings hash two slices directly instead of concatenating them into a throwaway Vec first; verify_composition_poly_opening and FieldElementVectorBackend::hash_data now delegate to their row-pair/parts siblings instead of duplicating the same hashing loop. Recursion guest: single-query 89.7M -> 73.7M cycles, multi-query 2.21B -> 1.82B cycles.
stream_bytes was calling the base-field override three times, one per limb, each landing as its own Digest::update on the guest — three BlockBuffer::digest_blocks + memcpy calls instead of one. Disassembly of the guest ELF showed dyn dispatch itself was fully devirtualized by the #[inline(always)] chain, so the actual cost was the call count, not indirection. Reuse the existing write_bytes_be override (already zero-alloc, byte-identical layout) into a stack buffer and sink once. Recursion guest: single-query 73.9M -> 71.1M cycles, multi-query 1.82B -> 1.78B cycles.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
AsBytesgains astream_bytessink method, overridden zero-alloc for the Goldilocks base field and its degree-3 extension, replacingas_bytes()/to_bytes_be()returning a freshVec<u8>per element per hash.FieldElementVectorBackend::hash_data_parts+verify_merkle_path_from_hashso row-pair openings (verify_opening_pair,verify_composition_poly_opening,verify_fri_layer_openings) hash two slices directly instead of concatenating them into a throwawayVecfirst;verify_composition_poly_openingandhash_datanow delegate to their row-pair/parts siblings instead of duplicating the hashing loop.stream_byteswas calling the base-field override three times (one per limb), landing as three separateDigest::updatecalls on the guest instead of one. Disassembly confirmed thedyn FnMutsink itself was fully devirtualized (no indirect-call cost), so the actual waste was call count. Now reuses the existing zero-allocwrite_bytes_beoverride into a stack buffer and sinks once.Test plan
cargo test --workspace --exclude math-cuda(492 prover + 137 stark tests, all green) after each commitmake test-ethrexcargo test -p lambda-vm-prover --lib test_recursion_execute_1query -- --ignored --nocapture(in-VM verify accepts, guest commitsvk_digest ‖ output, byte-identical across every change)make test-profile-recursion-single/-multicycle counts confirmed against baseline after each commitllvm-objdump) confirmingstream_bytesdevirtualizes and the extension-field batching removes redundantBlockBuffer::digest_blocks/memcpycalls